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Abstract 

We present a novel method for controlling the fc-familywise error rate (fc-FWER) in the linear 
regression setting using the knockoffs framework first introduced by Barber and Candes. Our 
procedure, which we also refer to as knockoffs, can be applied with any design matrix with at 
least as many observations as variables, and does not require knowing the noise variance. Unlike 
other multiple testing procedures which act directly on p-values, knockoffs is specifically tailored 
to linear regression and implicitly accounts for the statistical relationships between hypothesis 
tests of different coefficients. We prove that knockoffs controls the fc-FWER exactly in finite 
samples and show in simulations that it provides superior power to alternative procedures over 
a range of linear regression problems. We also discuss extensions to controlling other Type I 
error rates such as the false exceedance rate, and use it to identify candidates for mutations 
conferring drug-resistance in HIV. 

Keywords, fc-familywise error rate; knockoffs; multiple testing; linear regression; Lasso; negative bino¬ 
mial distribution. 


1 Introduction 

Multiple testing has received increasing attention with the advent of fields like genetics, technology, 
and astronomy which produce very high-dimensional datasets. The increasing number of hypothe¬ 
ses being simultaneously tested has motivated extensive research into procedures that maintain 
control of the familywise errors that abound when each hypothesis is only tested individually. For 
instance, the canonical criterion of the familywise error rate (FWER) controls the probability of 
falsely rejecting any of the true null hypotheses. A number of more modern landmark works have 
introduced new Type I error rates that allow for higher power by relaxing the FWER, including 
the false discovery rate (FDR) (Benjamini and Hochberg, 1995), the /c-FWER (Hommel and Hoff¬ 
mann, 1988; Lehmann and Romano, 2005), and the false discovery exceedance (FDX) (Genovese 
and Wasserman, 2004; van der Laan et ah, 2004). Each one has a different interpretation, but all 
control an error rate defined over all hypotheses being tested, so that conclusions can be drawn by 
considering rejected hypotheses together. 

Among multiple testing problems, some of most important deal with finding relationships be¬ 
tween variables. Such investigations are often posed as a linear model 

y = X(3 + z, 
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where X = [Xi,...,Xp] G is a design matrix, /3 G is a signal vector of interest, and 

z G is the error term. The hypotheses of interest are which variables f3j, after controlling for all 
other variables, contribute to the model, or have nonzero coefficients. With the ability to encode 
correlations between variables, linear models capture far more real-life examples than sequence 
models. Examples abound particularly in genetics, where one searches for relationships between 
parts of the genome, often in the form of single nucleotide polymorphisms or expression levels, and 
continuous variables such as health factors or drug response. Unfortunately, due to the dependence 
among the variables in the linear model, their respective tests do not in general exhibit any of the 
simple dependence structures, such as independence or positive dependence, that are required for 
many of the most powerful existing procedures. 

In this work we focus on controlling the /c-FWER, the probability of making at least k false 
discoveries, in the context of linear models. Our method uses the framework of knockoffs introduced 
by Barber and Candes (2015). The idea of knockoffs is to carefully construct artificial variables 
that serve as controls for the original variables. Barber and Candes show that these controls are 
easy to construct and can be used to automatically account for variable dependence to provide 
finite-sample FDR control for general design matrices without knowledge of the noise variance. 
Controlling the FDR can be highly desirable in a high-power setting, but results can be hard to 
interpret when few discoveries are made, as the realized false discovery proportion may be highly 
variable. The /c-FWER, which in the case of A: = 1 reduces to the standard FWER, always has a 
clear interpretation by explicitly bounding the probability of k or more false discoveries, making it 
a useful criterion in all settings, as evidenced by its wide acceptance in the scientific community. 
The /c-FWER also provides a fundamental building block to other Type I error rates, such as the 
FDX and Per Family Error Rate (PEER), as we will discuss in Section 4. We leverage the attractive 
features of the knockoffs framework to construct a novel procedure for controlling the /c-FWER that 
implicitly accounts for the exact dependence structure in linear regression problems. In particular, 
we prove finite-sample /c-FWER control for general design matrices without any knowledge of the 
noise variance, and show in simulations that the power can be substantially greater than state-of- 
the-art alternatives. 

Much previous work has studied controlling the /c-FWER under varying assumptions on the 
statistical dependence among the hypothesis test statistics or p-values. The bulk of such work has 
dealt with procedures that act directly on the p-values. When there are more observations than 
variables and the noise is i. i. d. Gaussian, ordinary least squares regression generates dependent t- 
statistics for all variables, allowing those procedures that can account for the dependence structure 
to be applied to the associated p-values. Unfortunately, the joint distribution of such p-values does 
not generally satisfy popular dependence assumptions such as positive regression dependence on 
subset (Benjamini and Yekutieli, 2001) or multivariate total positivity (Karlin and Rinott, 1980). 
Furthermore, many of the procedures that can account for general dependence structures do so 
nonparametrically through resampling. However, resampling procedures tend to require extra 
assumptions such as subset-pivotality (Westfall and Young, 1989) which do not hold in general 
in the regression setting, or only provide exact control asymptotically (Romano and Wolf, 2007). 
We mention here some work on controlling the fc-FWER in finite samples and refer the reader 
to Guo et al. (2014) for a more thorough review. The most popular methods for FWER control 
are the Bonferroni (Dunn, 1961) and Holm’s (Holm, 1979) procedures, neither of which require 
assumptions on the dependence among p-values. Under independence, the Bonferroni procedure 
can be improved using the Sidak correction (Sidak, 1967), or one can employ Hochberg’s step-up 
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procedure (Hochberg, 1988). In Lehmann and Romano (2005), step-down procedures generalizing 
Bonferroni and Holm’s procedures are presented, while Romano and Wolf (2007) introduce a generic 
step-down procedure, all for controlling the /c-FWER. Romano and Shaikh (2006) also present step- 
up procedures for controlling the fc-FWER under arbitrary unknown dependence. 

To avoid confusion, we point out that the recent work of Lockhart et al. (2014) provides p-values 
for coefficients in a linear model, however they deal with a different notion of a null hypothesis than 
used here. In their framework, the null hypotheses are defined sequentially with respect to a growing 
model, wherein each time the model size is increased by one, the null hypothesis is that the new 
variable is uncorrelated with the response, conditional on only the variables already included in the 
model. 

The remainder of the paper is structured as follows. Section 2 introduces notation and gives 
a short introduction to the knockoffs framework. Section 3 describes the knockoffs procedure for 
control of the /c-FWER and proves this control along with tail bounds. Section 4 provides a brief 
discussion of how the procedure can be used to control the PEER and FDX. Section 5 compares 
our procedure to state-of-the-art alternatives from the literature, both in terms of practical consid¬ 
erations and power, in a series of simulations. Section 6 demonstrates an implementation on a real 
dataset from genetics, and Section 7 concludes with discussion and directions for future research. 

2 Preliminaries for knockoffs 

In this section, we introduce the knockoffs machinery of Barber and Candes (2015) at a minimal 
level to be sufficient for our exposition of /c-FWER control. This material is largely borrowed 
from the reference Barber and Candes (2015). In referring to the knockoffs framework, we always 
assume that the number of observations n is at least the number of variables p, the design matrix 
X has full rank so that the Gram matrix X~^X is invertible, and the noise term z has independent 
Gaussian entries. We would like to briefly emphasize here that n > p is necessary for the multiple 
hypothesis testing problem to even be well-defined. For any linear regression problem, the “true” 
coefficient vector is only statistically well-defined modulo addition with any vector in the null space 
of the design matrix. If p > n, then the design matrix has a nontrivial null space, thus allowing 
zeros and nonzeros in the coefficient vector to arise and disappear, changing the fundamental values 
of the null hypotheses, without changing the data-generating process at all. Except for this non¬ 
degeneracy assumption, the knockoffs machinery works for general designs X and does not even 
require knowledge of noise variance ci^. 

To start with, again, consider the linear model 

y = X(3 + z, 

where the noise vector z has independent AA(0 ,cj^) entries, and each column of X has been nor¬ 
malized to have unit .^ 2 -norm, that is, ||X^j || = 1 for all 1 < j < p. The first step of this method is 
to construct the knockoff design, denoted as X G that obeys 

X^X = X^X, X^X = X^X-Diag{s), (1) 

where s G has nonnegative entries and the superscript T denotes matrix transpose hereafter. 
There are multiple ways to construct this knockoff design. The first equality forces X to have 
the same correlation structure among its columns as X. In the ideal case of n > 2p, it can be 
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guaranteed that the 2p column vectors of X and X are jointly linearly independent. By the second 
equality, for every 1 < j < p, the original variable Xj and the knockoff counterpart Xj have the 
same correlation with all the other 2p — 2 variables, namely, Xi,Xi for i ^ j- At a high level, we 
can view the knockoff design as a control group as compared to the original design X, which is 
treated as the case group. 

Denote by Xko = G concatenation of the original design and the knockoff 

design. With Xko in hand, the next step is to generate statistics for each variable. One way to 
do so, suggested in Barber and Candes (2015), is by fitting the entire Lasso regularization path on 
the augmented design, 

^(A) = argmin ^\\y - X^obf + MMi, (2) 

6eiR2p ^ 

and letting Zj be the first A such that /3j is nonzero. Formally, 

Zj = sup{A : /3j(A) / 0}. 

As pointed in the reference paper, many alternative statistics, including some based on least-squares, 
least angle regression (Efron et ah, 2004) and sorted-£i-penalized estimation (Bogdan et ah, 2015; 
Su and Candes, 2015), can be used instead as long as they obey the sufficiency and antisymmetry 
properties defined therein. Defining Zj analogously for each knockoff variable Xj, the knockoff 
statistics (using slightly different notation than in the original paper) are 

Wj = max{Zj, Zj}, Xj = sgn{Zj - Zj), 

where sgn(x) = —1,0,1 if x < 0, x = 0, x > 0, respectively. The following result, due to Barber 
and Candes (2015), characterizes the joint distribution of the null Xj- We say j is a true null when 
I3j = 0 and a false null otherwise. 

Lemma 1 (Barber and Candes (2015)). Conditional on all Wj and all false null Xj, M true null 
Xj are jointly independent and uniformly distributed on {—1,1}. 

This simple lemma is very helpful in proving fc-FWER control. Its proof follows from the 
symmetry between Xj and Xj if /3j = 0, which is provided by the construction (1). The lemma 
shows that Xj can be interpreted as a one-bit p-value, in the sense that it has equal chance to take 
1 or —1 if (3j = 0. In fact when fij = 0, the knockoff symmetry characterized in (1) introduces 
exchangeability between Xj and its knockoff counterpart Xj in the Lasso path (2). Hence, Xj and 
are equally likely to enter the Lasso path first. Conversely, if fij 0, then Xj is likely to enter 
before Xj so that Xj = 1. Thus a large Wj and a positive Xj provide evidence against the jth null 
hypothesis Hqj : f3j = 0. 


3 /c-familywise error rate control 

Inspired by the interpretation of the statistics Wj and Xj^ h is reasonable to reject hypotheses with 
positive signs Xj ^-nd large Wj. Parameterized by a positive integer v, the knockoffs procedure for 
controlling the /c-FWER is as follows. 

Step 1. Denote by IFp(i) > lFp( 2 ) • • • > ^p(p) order statistics of W, where p(l), ■ ■ ■, pip) is a 
permutation of 1,... ,p. 
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Step 2. Let j* be the index of the fth —1 in the sequence Xp(i)) • • • > Xp{p)- If fewer than v negatives 
appear, set j* = p. 

Step 3. Reject all the null hypotheses -Ro,pOj whenever j < j* and Xj = +1- 
More compactly, define the threshold 

r„ = supjt > 0 : #{j ; Wj > t,Xj = -1} = 

with the usual convention that sup0 = —oo. The multiplicity of Wj is not accounted for since all 
Wj are unique with probability 1. Then, the knockoffs procedure rejects all Hqj with Wj > Ty and 

Xj = +1- 

Before characterizing the distribution of false discoveries made by the knockoffs procedure, we 
define some notation. Let Mjj = {1 < j < p '■ f3j = 0} he the set of true null hypotheses and 
NB(m, q) denote a negative binomial random variable, which counts the number of successes before 
the mth failure in a sequence of independent Bernoulli trials with success probability q. 

Lemma 2. For any integer v >1, the false discovery number 

y = if{j ^ fFo :Wj >Ty and Xj = +1} 

is stochastically dominated by NB(r;,l/2). 

Proof of Lemma 2. First, we prove this lemma in the case where Mjj = {!,... ,p}, that is, (dj = 0 
for all j. Conditional on all Wj, Lemma 1 concludes that Xp(i)) • • • > Xp(p) are independent and each 
takes +1 and —1, respectively, with probability 1/2. Note that the permutation p is deterministic 
conditional on the Wj. Recognizing that V is the number of positive Xj before the nth negative or 
the pth trial happens, whichever comes first, we see that V is an early stopped negative binomial 
random variable. In the general case, false null Xj insert — I’s into the process on the nulls, 
causing it to stop no later than when Mfi = {l,...,p}. Therefore, V is always stochastically 
dominated by NB(r;, 1/2). □ 

The stochastic upper bound in Lemma 2 is tight in the following sense. The distribution of 
V can be made arbitrarily close to NB(u, 1/2) under the global null by taking p ^ v, as in this 
case at least v negative Xj will appear in the sequence with high probability. Next we present the 
main result, which is immediate from Lemma 2 and the negative binomial cumulative distribution 
function. 


Theorem 1. For any integer k > 1 and significance 0 < a < 1, let v to be the largest integer 
satisfying 




< a. 


(3) 


Then the knockoffs procedure with parameter v controls the k-FWER at level a, that is, P(R > 
k) < a. 


As a concrete example, taking v = A would provide 10-FWER control at level 0.05. As one may 
observe from (3), the integer u as a function of the level a cannot be continuous. Consequently, 
P(R > k) is in general lower than the target level a. In particular, for a < 1/2^ no positive integer 
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V satisfies (3), so the naive procedure must reject nothing. This matter can be easily resolved by 
randomization of u, as we will show in Remark 1. 

To better understand the knockoffs procedure, we may want to know how many false rejections 
are made when the /c-FWER is not controlled. To this end, the following result bounds the tail 
probability of V, or the probability of making many more rejections than expected. 

Corollary 1. For arbitrary a > 0, the error rate of the knockoffs procedure with parameter v obeys 


F{V > (l + a)u) < e{ay, 

where 0(a) = 2 <^+ 2 {ali)‘^+i < 1- 

Proof of Corollary 1. By Lemma 2, it suffices to prove the inequality when V is distributed as 
NB(u, 1/2). For any positive number ry < log2, from the Markov inequality we get 


F{V >k)< 

The desired bound follows from taking ry 


E(e’'^) _ 1 

g(l+a)?7V ("2 _ er?^t>g{l+a)77t>' 

= log(2 + 2a) - log(2 + a). 


□ 


Remark 1 (Power Improvement). As mentioned earlier, the knockoffs procedure suffers from a 
discretization problem, especially for small k, but this can be remedied by randomization as follows. 
For any desired level a G (0,1), there must exist an integer u > 0 such that 


IF’i)(E > /c) < a < P„+i(R > k), 

where the subscript u or u + 1 emphasizes the parameter of the knockoffs procedure. We can devise 
a mixture procedure that obeys exactly P(R > k) = ahy putting weights uj and 1 — cu, respectively, 
on the knockoffs procedures with parameters v and u + 1, where 

_ Fy+i{V>k)-a 

F,+i{V>k)-FyV>k)' 

Furthermore, as with any procedure controlling the fc-FWER, power can always be improved 
without affecting the /c-FWER by always making at least k — 1 rejections. In the case of knockoffs, 
if we were going to make fewer than k — 1 rejections, we can simply continue rejecting the indices 
with the largest Wj and positive Xj until there are k — 1. The benefit of this modification depends 
on the ordering of the hypotheses induced by Wj. 

4 Controlling other error rates 

This paper has been about controlling the /c-FWER, but the procedure introduced can be used to 
control other Type I error rates as well, namely the PEER and the EDX. 

Originally proposed by John Tukey in an unpublished work in 1953, the PEER is defined as 
E(E), or in words, the expected number of false discoveries. The control of this error rate under 
general p-value dependence has not received as much attention in the literature as other error rates, 
although both Gordon et al. (2007) and Meng et al. (2014) have discussed using the Bonferroni 
procedure for this purpose. Lemma 2 shows that the knockoffs procedure for controlling the k- 
FWER also controls the PEER at level u, as E(P) < ENB(a, 1/2) = = v. 
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The FDX, also known as the 7 -false discovery proportion, tail probability for the proportion of 
false positives, or false discovery excessive probability, is the probability that the FDP exceeds a 
specified bound 7 . It can be viewed as a more stringent form of the FDR, and has received much 
attention recently; see, for example Guo et al. (2014). A number of authors have noticed its inti¬ 
mate connection with the fc-FWER, and many of the most successful FDX-controlling procedures 
in the literature can be posed as meta-procedures applied to a family of fc-FWER-controlling pro¬ 
cedures (van der Laan et ah, 2004; Genovese and Wasserman, 2004; Romano and Wolf, 2007). We 
briefly review three such meta-procedures, any one of which could be combined with the knockoffs 
procedure introduced here, and defer further investigation to future work. 

In van der Laan et al. (2004), the authors introduced a simple and intuitive procedure which 
augments any EWER-controlling procedure to control the EDX. This procedure was generalized 
to any /c-EWER-controlling procedure in Genovese and Wasserman (2006). Once the /c-EWER- 
controlling procedure makes R rejections, then if {k — 1)/R > 7 , the augmentation procedure 
makes no rejections, but if {k — 1)/R < 7 , r more rejections can be made, where r satisfies 
(A: —l-|-r) /[R+r) < 7 . This augmentation procedure controls the EDX exactly when the underlying 
/c-EWER-controlling procedure also provides exact control. 

Genovese and Wasserman (2004) proposed a test-inversion procedure for EDX control, similar 
to the closure principle of Marcus et al. (1976) for EWER control, which was then investigated 
further in Genovese and Wasserman (2006). The inversion procedure runs global null hypothesis 
tests on every subset of hypotheses, and then finds the largest subset S whose maximal intersection 
with any subset for which the global null was not rejected is at most yl^j. Note that any /c-EWER- 
controlling procedure is also trivially a test of the global null hypothesis, rejecting whenever k or 
more rejections are made. Rejecting S from the inversion procedure controls the EDX exactly, and 
although in general it takes exponential time, for some global tests it can be run in polynomial 
time (Genovese and Wasserman, 2004). 

Given a procedure that can control the /c-EWER for any /c > 1, Romano and Wolf (2007) propose 
a heuristic that aims to control the EDX. In short, given a prescribed level 7 and significance a, 
both between 0 and 1, this heuristic uses a /c-EWER-controlling procedure to make rejections for 
increasing k until just before the number of rejections goes above /c /7 — 1. Explicitly, let Rk be 
the number of rejections made by a procedure controlling the /c-EWER. Then the Romano-Wolf 
heuristic defines k as the smallest k such that Rk < kf-j — l and makes rejections as if controlling the 
/c-EWER. Although not rigorous due to its adaptivity in k, under some dependence assumptions, 
the Romano-Wolf heuristic is shown to enjoy finite sample or asymptotic EDX control for step-down 
procedures (Guo and Romano, 2007; Delattre and Roquain, 2013). 

5 Comparison with other procedures 

As mentioned in the introduction, the structure and dependence between coefficients in linear 
regression preclude the use of many existing procedures. The state-of-the-art procedures that can 
be found in existing literature and provide exact finite-sample control of the /c-EWER in linear 
regression are: 

(a) the generic step-down procedure of Romano and Wolf (2007) applied to the least-squares 
p-values 

(b) the step-up procedure of Romano and Shaikh (2006) applied to the least-squares p-values 
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(c) the adaptation of Holm’s procedure to /c-FWER applied to the least-squares p-values (Lehmann 
and Romano, 2005) 

(d) for 1-FWER, the Lasso pathwise testing procedure of Lee et al. (2013) 

(e) also for the 1-FWER, the closure of any global hypothesis testing procedure, such as the 
test, that can be applied to p-values with any known dependence, applied to the least-squares 
p-values 

Procedure (d) requires the user to know a‘^ exactly, and both (d) and (e) take computation that 
is exponential in the dimension, p, making them infeasible to use for problems of even moderate 
size. As a result, we only compare our procedure to (a), (b), and (c). It should be noted that 
the problem dimensions we considered in simulations were still limited by procedure (b), whose 
computation time is 0{p^~^), since each threshold is computed as a maximum over subsets of size 
k — 1 from a superset of size up to p. There are also works that obtain asymptotic control of 
the EWER under some assumptions on the distribution of the design matrix (see, for example, 
Chernozhukov et al. (2013); Javanmard and Montanari (2014)). As knockoffs applies under no 
assumptions on the design matrix and the error rates are controlled exactly, we do not compare to 
such works here. 

In each of the following simulations, we performed many independent experiments to gauge how 
the performance of knockoffs, both in absolute terms and relative to previous methods, depends on 
correlation in the columns of X, the sparsity of (3, and the signal to noise ratio. In each experiment, 
X is generated by normalizing the columns of a multivariate Gaussian matrix with independent 
and identically distributed rows, and /3 is generated by setting a pre-specified number of entries to 
zero, and setting the rest to the same nonzero magnitude, which is also prespecified. The following 
experiments are all performed in the sparse setting, as that is what the canonical statistics W that 
use the Lasso are best-suited for. However, nothing about the knockoffs framework to control any 
Type I error rate is particularly tied to sparsity, and it is of continuing interest to find different 
statistics W that achieve high power in all manner of settings. In all the following simulations, 
n = 1000, p = 450, cj^ = 25, we control the 5-FWER at the 5% level, and we apply the modifications 
in Remark 1. The step-up procedure is implemented using the critical values suggested in Romano 
and Shaikh (2006), namely their Equation (13). For a sake of reproducibility, the code to generate 
these figures is available at http://wjsu.web.stanford.edu/code.html. 

Our first experiment took (3 to have 10 nonzero elements, all with magnitude 10, and varied 
the pairwise correlation between the columns of X from 0 to 0.5. Figure 1 shows the power of the 
knockoff procedure nearly doubling that of all alternative procedures. The power and 5-FWER of 
all four procedures is largely unaffected by the correlation in the columns of X. 

Our second experiment generated columns for X independently, and varied the sparsity of /3, 
with each nonzero coefficient having magnitude 10. Figure 2 shows the power of the knockoff 
procedure approximately doubling that of all alternative procedures in the sparsest regime and 
gradually losing its advantage as the sparsity approaches 10%. The 5-FWER of the knockoffs and 
step-down decrease as the coefficient vector becomes less sparse, with that of knockoffs becoming 
conservative especially quickly. 

Our third experiment generated independent columns for X, used (3 with 10 nonzero entries, 
and varied the magnitude of the nonzero entries on a logarithmic scale. 

Figure 3 shows the power of the knockoff procedure above all alternative procedures in the low- 
to middle-power regimes, while it actually has slightly less power in the very high-power regime, 




Figure 1: Comparison of Holm’s procedure, generic step-down procedure, step-up procedure, 
and knockoffs for controlling the 5-FWER at the 5% significance level. As functions of the 
column correlation of the design matrix, the procedures’ powers are shown in (a), while the 
5-FWER is given in (b), with the grey line denoting the nominal level of 5%. The curves for 
Holm and step-up lie on top of one another. Each point is an average over 2000 simulations. 



Figure 2: Comparison of Holm’s procedure, generic step-down procedure, step-up procedure, 
and knockoffs for controlling the 5-FWER at the 5% significance level. As functions of the 
number of nonzero coefficients, the procedures’ powers are shown in (a), while the 5-EWER is 
given in (b), with the grey line denoting the nominal level of 5 %. The curves for Holm and 
step-up he on top of one another. Each point is an average over 2000 simulations. 


corresponding to a signal-to-noise ratio ||/3|p/cr^ > 350. This reversal can be explained by the 
fact that with non-orthogonal columns and a not-extremely-sparse /3, the Lasso will not perfectly 
select all signal variables before the non-signal variables, even when the signal-to-noise ratio is 
extremely high (Su et ah, 2015). As such, the Lasso-based W statistic used in knockoffs never 
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Figure 3: Comparison of Holm’s procedure, generic step-down procedure, step-up procedure 
and knockoffs for controlling the 5-FWER at the 5% significance level. As functions of the 
magnitude of the nonzero coefficients, the procedures’ powers are shown in (a), while the 5- 
FWER is given in (b), with the grey line denoting the nominal level of 5%. Each point is an 
average over 2000 simulations. 


achieves a power of 1; this phenomenon could be remedied by using one of the least-squares-based 
W mentioned in Barber and Candes (2015). The 5-FWER of all four procedures is again largely 
unaffected by the coefficient magnitude. 

6 Real data experiment 

In this section, we apply our method to a data set on HIV drug resistance. Specifically, the data set, 
described and analyzed in Rhee et al. (2006) and also used in the original knockoffs paper Barber 
and Candes (2015), contains genotype information from samples of HIV Type 1, along with drug 
resistance measurements for 16 drugs across three classes. The three classes are protease inhibitors 
(PI), nucleoside reverse transcriptase inhibitors (NRTI), and nonnucleoside reverse transcriptase 
inhibitors (NNRTI), each of which has its own set of samples. Drug resistance was measured as the 
log-fold-increase of resistance as compared to a control, and the genetic information comes as single 
nucleotide polymorphisms (SNPs), and thus each binary value represents the presence or absence 
of a minor allele at a given locus. 

In order to analyze the data, some cleaning was required. In particular, some samples do not 
have resistance measurements for some of the drugs, so these samples were removed on a drug-by- 
drug basis. Also, some SNPs have so few mutations that either their effect would be too hard to 
detect, or their inclusion actually causes rank-deficiency in the design matrix. As such, for each 
drug we only included polymorphisms with at least hve mutations present in the culled sample; 
this was the minimum required to ensure all design matrices were full-rank. 

We compare our knockoffs procedure to the step-down, step-up, and Holm procedures, as well 
as to the original knockoffs procedure for controlling FDR at level q. As fc-FWER is often used as 
an exploratory analysis, and to make analysis comparable with knockoffs for FDR control, we set 
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Table 1: Multiple testing procedures applied to HIV drug resistance data sets 


Drug 

Type 

Samples 

SNPs 

FDR ko 

A:-FWER ko 

Step-down 

Step-up 

Holm 

APV 

PI 

767 

164 

19/29 

10/10 

14/18 

14/15 

14/17 

ATV 

PI 

328 

104 

22/28 

18/19 

18/20 

14/14 

17/19 

IDV 

PI 

825 

165 

25/42 

15/17 

17/21 

17/20 

17/20 

LPV 

PI 

515 

141 

17/18 

13/14 

17/18 

13/13 

14/14 

NFV 

PI 

842 

166 

26/40 

20/22 

17/21 

16/18 

17/21 

RTV 

PI 

793 

163 

20/26 

18/18 

17/23 

15/17 

15/20 

SQV 

PI 

824 

164 

20/31 

19/29 

16/21 

15/18 

15/19 

X3TC 

NRTI 

629 

216 

4/6 

5/7 

6/9 

5/6 

6/8 

ABC 

NRTI 

623 

216 

16/35 

16/31 

8/11 

8/11 

8/11 

AZT 

NRTI 

626 

216 

15/21 

13/17 

13/21 

10/14 

11/18 

D4T 

NRTI 

625 

216 

15/26 

13/21 

11/12 

10/11 

10/11 

DDI 

NRTI 

628 

216 

2/2 

5/5 

8/13 

7/9 

8/12 

TDF 

NRTI 

351 

148 

6/6 

8/8 

9/11 

7/8 

9/10 

DLV 

NNRTI 

730 

231 

10/25 

10/16 

11/25 

11/20 

11/22 

EFV 

NNRTI 

732 

236 

11/21 

11/19 

10/17 

10/16 

10/16 

NVP 

NNRTI 

744 

236 

10/23 

8/13 

7/15 

7/12 

7/13 

Average Number of True Discoveries 

2-FWER 

14.9 

0.81 

12.6 

0.63 

12.4 

0.88 

11.2 

0.63 

11.8 

0.81 


Summary: For each procedure, we report the number of true positives and the number of total discoveries, 
separated by a slash. At the end of the table we report summary statistics for each procedure, ko stands 
for knockoffs. 
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a = 0.5 (FDR controls a mean, and with a = 0.5, /c-FWER controls a median). We set k = 2 and 
q = 0.2, and ran all five procedures on all 16 drugs, the results of which are summarized in Table 1. 

Although the ground truth is unknown in this case, there exists an approximate ground truth 
from treatment-selected mutation (TSM) panels (Rhee et ah, 2005). These panels list mutations 
that were found to be statistically significantly more frequent in virus samples from individuals 
treated with a drug in that class than samples from individuals who had not. Thus in our experiment 
evaluation, we consider a SNP discovery for a given drug to be true if it has a mutation listed in 
the TSM panel for that drug’s class. 

The table shows the number of total discoveries and false discoveries made by each method 
on each data set. As suspected, FDR-controlling knockoffs was more powerful than any of the 
/c-FWER-controlling procedures, but is harder to interpret as it never makes a very large number 
of discoveries, and thus the FDP may be quite different from q. The remaining procedures have 
varying levels of 2-FWER, but recall that the error rates reported are likely to be overestimates, 
as there may be important SNPs that the TSM panels missed. Still, we see that on this data set, 
the step-down and Holm procedures commit more 2-family wise errors than knockoffs, while the 
step-up procedure has over 10% less power than knockoffs. 

7 Discussion 

This work leaves a number of important avenues open for future research. Pirst, we mentioned 
in Section 3 a number of methods that translate fc-EWER-controlling procedures into procedures 
for controlling the FDX. Investigating the best such method could yield a powerful method for 
controlling another important Type I error rate. Second, Barber and Candes (2015) mention in 
passing the possibility of multiple knockoffs, i.e., constructing m > 1 sets of knockoffs and replacing 
the one-bit p-values corresponding to the Xj’s with m + 1-discretized p-values. In the setting of 
PDR control, one can search over many one-bit p-values and need only consider what fraction, 
on average, may be false discoveries. However to control the A:-PWER, one must keep track of 
every false discovery, and we may expect the extra resolution of multiple knockoffs to provide more 
power to distinguish true discoveries from false ones. Lastly, we feel the knockoffs framework is 
still a largely untapped resource for generating multiple testing procedures. The investigation of 
alternative Wj statistics for ordering variables, and the extension to other regression settings such 
as logistic regression and higher-dimensional problems (p > n) are all important open subjects. 

We have presented a novel method for controlling the /c-FWER in the context of linear regres¬ 
sion. Knockoffs requires no knowledge of the noise variance and implicitly takes into account the 
exact dependence structure of the problem, allowing it to provide considerable power improvements 
over state-of-the-art alternatives in a range of settings. This, along with its intuitive justification 
and ease of computation, makes knockoffs a useful practical tool for multiple hypothesis testing. 
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